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Abstract 

Broadcasting is known to be an efficient means of disseminating data in wireless communica- 
tion environments (such as Satellite, mobile phone networks,...). It has been recently observed 
that the average service time of broadcast systems can be considerably improved by taking 
into consideration existing correlations between requests. We study a pull-based data broadcast 
system where users request possibly overlapping sets of items; a request is served when all its 
requested items are downloaded. We aim at minimizing the average user perceived latency, i.e. 
the average flow time of the requests. We flrst show that any algorithm that ignores the de- 
pendencies can yield arbitrary bad performances with respect to the optimum even if it is given 
arbitrary extra resources. We then design a (4 + e)-speed 0(1 + l/e^)-competitive algorithm 
for this setting that consists in 1) splitting evenly the bandwidth among each requested set and 
in 2) broadcasting arbitrarily the items still missing in each set into the bandwidth the set has 
received. Our algorithm presents several interesting features: it is simple to implement, non- 
clairvoyant, fair to users so that no user may starve for a long period of time, and guarantees 
good performances in presence of correlations between user requests (without any change in the 
broadcast protocol). We also present a (4 + e)-speed 0(1 + l/e^)-competitive algorithm which 
broadcasts at most one item at any given time and preempts each item broadcast at most once 
on average. As a side result of our analysis, we design a competitive algorithm for a particular 
setting of non-clairvoyant job scheduling with dependencies, which might be of independent 
interest. 

Keywords: Multicast scheduling. Pull-based broadcast. Correlation-based, Non-clairvoyant 
scheduling. Resource augmentation. 



Omitted proofs, lemmas, notes and figures 
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1 Introduction 



Motivations. Broadcasting is known to be an efficient means of disseminating data in wireless 
communication environments (such as Satellite, mobile phone networks,...). It has been recently 
observed in ^3 E] that the average service time of broadcast systems can be considerably im- 
proved by taking into consideration existing correlations between requests. Most of the theoretical 
research on data broadcasting was conduct until very recently under the assumption that user 
requests are for a single item at a time and are independent of each other. However, users usually 
request several items at a time which are, to a large extent, correlated. A typical example is a web 
server: users request web pages that are composed of a lot of shared components such as logos, 
style sheets, title bar, news headers,..., and all these components have to be downloaded together 
when any individual page is requested. Note that some of these components, e.g. news header, 
may constantly vary over time (size and/or content). 

Pull-based data broadcast with dependencies. We study a pull-based data broadcast 
system where users request possibly overlapping sets of items. We aim at minimizing the average 
user perceived latency, i.e. the average flow time of the requests, where the flow time of a request 
is defined as the time elapsed between its arrival and the end of the download of the last requested 
item. We assume that user cannot start downloading an item in the middle of its broadcast. 
When the broadcast of an item starts, all the outstanding requests asking for this item can 
start downloading it. Several items may be downloaded simultaneously. We consider the online 
setting where the scheduler is non- clairvoyant and discovers each request at the time of its arrival; 
furthermore, the scheduler does not even know the lengths of the requested items and is aware of 
the completion of a broadcast only at the time of its completion. Items are however labeled with a 
unique ID to allow their retrieval. Note that this are the typical requirements of a real life systems 
where items may vary over time. 

Background. It is well known that preemption is required in such systems in order to achieve 
reasonable performances. Furthermore, [Jj proved that even without dependencies, no algorithm 
can guarantee a flow time less than Q{^/n) times the optimal. The traditional approach in online 
algorithms consists then in penalizing the optimum by increasing the bandwidth given to the al- 
gorithm so that its performances can be compared to the optimum. This technique is known as 
resource augmentation and provides interesting insights on the relative performances of different 
algorithms that could not be compared directly to the optimum cost. In our case, we give to our 
algorithm a bandwidth 5 > 1 and show that it achieves a flow time less than a constant times the 
optimum cost with a bandwidth 1. Formally, an algorithm is s- speed c- competitive if when given a 
bandwidth 5, its flow time is at most at a factor c of the optimum flow time with bandwidth 1. 

To our knowledge the only positive results [HI in the onhne setting assume that the requests 
are independent and ask for one single item. The authors show that without dependencies the 
algorithms Equi and LWF are competitive. Equi which sphts evenly the bandwidth among the 
alive requested items, is (4 + e)-speed (2 + 8/e)-competitive, and LWF, which broadcasts the item 
where the aggregate waiting times of the outstanding requests for that item is maximized, is 6-speed 
0(l)-competitive (where the bound proved on the competitive ratio is 0(1) = 6,000,000). In the 
offline setting, where the requests and their arrival times are known at time t = 0, the problem is 
already NP-hard but better bounds can be obtained using linear programming ^1 EJl El Ql 12]; 
the latest result, [2\ to our knowledge, is a 0(log^(T + n)/loglog(T + n))-approximation where n 
is the number of requests and T the arrival time of the last request. To our knowledge, our results 
are the first provably efficient algorithms to deal with dependencies in the online setting. 

Concerning the push-based variant of the problem, where the requests arrival times follow some 
Poisson process and the requested sets are identically distributed according to a fixed distribution, 
constant factor approximations exist in presence of dependencies 01 El- The latest result. 
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[H], obtains a 4-approximation if the requested sets are drawn according to an arbitrary fixed 
distribution over a finite number of subsets of items. 

Our contribution. We first show that the performances of any algorithm that ignores the depen- 
dencies can be arbitrarily far from the optimal cost even if it is given arbitrary extra resources. We 
then design a (4 + e)-speed 0(1 + l/e^)-competitive algorithm B-EquiSet for the non-clairvoyant 
data broadcast problem with dependencies. B-EquiSet consists in 1) splitting evenly the band- 
width among each requested set and in 2) broadcasting arbitrarily the items still missing in each 
set into the bandwidth the set has received. The spirit of the algorithm is that one should favor 
the users over the items in the sense that it splits the bandwidth evenly among the outstanding re- 
quested sets and arbitrarily among the outstanding items within each requested set. Our algorithm 
presents several interesting features: it is simple to implement, non-clairvoyant, fair to users so that 
no user may starve for a long period of time, and improves performances in presence of correlations 
between user requests (without any change in the broadcast protocol). Presicely, we prove that: 

Theorem 1 (Main result) For all 6 > and e > 0, B-EquiSet is a {1 + 6){4: -\- e)-speed 
(2 + 8/e)(l + 1/S) -competitive algorithm for the online data broadcast problem with dependencies. 

One could object that B-EquiSet is unrealistic since it can split the bandwidth arbitrarily. 
But using the same technic as in [7J, it is easy to modify B-EquiSet to obtain an other competitive 
algorithm B-EquiSet-Edf (described at the end of section which, with a slight increase of 
bandwidth, ensures that at most one item is broadcast at any given time and that each broadcast 
is preempted at most once on average. 

Theorem 2 (Bounded preemption) For all 6 > and e > 0, B-EquiSet-Edf is a 

(1 + 6)'^ {4: + e)- speed {2-\-8/e){l-\-l/ 6)'^ -competitive algorithm for the online data broadcast problem 
with dependencies, where each broadcast is preempted at most once on average. 

Our analysis takes its inspiration in the methods developed in [7J. In order to extend their anal- 
ysis to our algorithm, we have also designed a new competitive algorithm EquioA for a particular 
setting of non-clairvoyant job scheduling with dependencies which might be of independent interest 
(Theorem 13). 

The next section gives a formal description of the problem and shows that it is required to take 
dependencies into account to obtain a competitive algorithm. Section 01 exposes the algorithm B- 
EquiSet and introduces useful notations. SectionEJdesigns a competitive algorithm EquioA for a 
variant of job scheduling with dependencies that is used in SectionElto analyze the competitiveness 
of our algorithm B-EquiSet. 

2 Definitions and notations 

The problem. The input consists of: 

• A set U of n items /i , . . . , each of length , . . . , 

• A set S of g requests for q non-empty sets of items S^i, . . . , 5^ C J, with arrival times ai, . . . , a^. 

Schedule. A s-speed schedule is an allocation of a bandwidth of size s to the items of U over 
the time. Formally, it is described by a function r : J x [0, oc) [0, s] such that for all time 
t, ^t) < 8] r(I^t) represents the rate of the broadcast of / at time t, i.e., the amount 

of bandwidth allotted to item / at time t. An item li is broadcast between t and if its 
broadcast starts at time t and if the total bandwidth allotted to li between t and sums up 
to 4, ^-e., if r{Ii^t)dt = ii. We denote by c{Ii^k) the date of the completion of the kth 
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broadcast of item Formally, it is the first date such that " W{Ii,t)dt — kli (note that 
c(/^,0) = 0). We denote by b{Ii,k) the date of the beginning of the fcth broadcast of item i.e. 
h{U, k) = inf{t > c(/„ A: - 1) : r(/„ t) > O}.^ 

Cost. For all time let B{Ii^t) be the time of the beginning of the first broadcast of item 
after t, i.e. B{Ii,t) = min{6(/i,fc) : b{Ii^k) > t}. For all time t, C{Ii^t) denotes the time of the 
end of the first broadcast of item starting after t, i.e. C(Ii^t) = min{c(/^, k) : k) ^ t}. The 
completion time cj of request Sj is the first time such that every item in Sj has been broadcast 
(or downloaded) after its arrival time a^, i.e., cj = max/.^^-^. C{Ii^aj). We aim at minimizing the 
average completion time defined as ^ ^Sj^^^^j ~ ^j)' equivalently the h'me defined as the 
sum of the waiting times, i.e. B-FlowTime = ^s-^^i^j ~ ^j)- denote by B0PT5(S) the flow 
time of an optimal 5-speed schedule for a given instance S. 

5-Speed c-Competitive Algorithms. We consider the online setting of the problem, in which 
the scheduler gets informed of the existence of each request Sj at time Oj and not before. The 
scheduler is not even aware of the lengths {£i)j.^Sj of the requested items in each set nor of the total 
number n of available items. It is well known (e.^., see [Zj) that in this setting, it is impossible to 
approximate within a factor o(y/n) the optimum flow time for a given bandwidth s even if all items 
have unit length (independently of any conjecture such as P = NP). The traditional approach in 
online algorithms consists then in penalizing the optimum by increasing the bandwidth given to 
the algorithm so that its performances can be compared to the optimum. This technique is known 
as resource augmentation and provides interesting insights on the relative performances of different 
algorithms that could not be compared directly to the optimum cost. In our case, we give to our 
algorithm a bandwidth 5 > 1 and show that it achieves a flow time less than a constant times the 
optimum cost with a bandwidth 1. Formally, an algorithm is s -speed c- competitive if when given 
s times as many resources as the adversary, its cost is no more than c times the optimum cost. 
In our case the resource is the bandwidth, and we compare the cost As of a scheduler A with a 
bandwidth 5, to the cost BOPTi of an optimal schedule on a unit bandwidth. (We denote by Ag 
the cost of an algorithm A when given a bandwidth s.) 

We show below that ignoring existing dependencies can lead to arbitrarily bad solutions. 

Fact 3 (Dependencies cannot be ignored) No algorithm A that ignores dependencies is s- 
speed c- competitive for any c < if A is deterministic, and for any c < -^y/n if A is ran- 

domized. 

Proof. Consider first a deterministic algorithm A which is given a bandwidth s and consider the 
instance where n different items are requested at time t = 0. Since A ignores the dependencies, we 
set them after the execution of the algorithm A: one request asks for the n — ^/n items that have 
been served the most by A at time t — {n — ^Jn)| s., and ^Jn requests ask for each of the remaining 
^Jn items. Then, algorithm A serves each request only after time t — {n — ^/n)/ s and its flow time is 
at least (y^+ l)(n — y^)/5 ^ n^/n/s. The optimal solution with bandwidth only 1 first broadcasts 
the items corresponding to the ^Jn unit length requests and then broadcasts the n — yjn remaining 
items; the optimal flow time is then {n + X^^^i ^) ^ This shows a gap of between the 

optimal cost with bandwidth 1 and every deterministic algorithm with bandwidth s — 0{y/n)^ 

^Remark that this formaUzation prevents from broadcasting the same item twice at a given time or from aborting 
the current broadcast of an item. The first point is not restrictive since if two broadcasts of the same item overlap, 
one reduces the service time by using the beginning of the bandwidth allotted to the second broadcast to complete 
earlier the first, and then the end of the first to complete the second on time. The second point is at our strict 
disadvantage since it does not penalize an optimal schedule that would never start a broadcast to abort it later on. 
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which ignores the dependencies. We extend the result to randomized algorithms thanks to Yao's 
principle QSIEI (Omitted). □ 



3 The Algorithm B-EquiSet 

Definitions. A request Sj for a subset of items is said to be alive at time t if t aj and if the 
download of at least one item G Sj is not yet completed at time t, i.e., t < C{Ii,aj). We say 
that an item li G Sj whose download is not yet completed {i.e., such that aj ^ t < C{Ii,aj)) is 
alive for Sj at time t. 

The B-EquiSet Algorithm. Consider that we are given a bandwidth s. Let R{t) be the set 
of alive requests at time t during the execution of the algorithm. For all B-EquiSet allocates 
to each alive request the same amount of bandwidth, s/\R{t)\; then, for each alive request Sj, it 
splits arbitrarily the s/\R{t)\ bandwidth allotted to Sj among its alive items. Precisely, it allocates 
to each item li alive for Sj at time t, an arbitrary amount of bandwidth, rj^i{t) ^ 0, such that 
"^li alive for Sj ^jA^) ~ B-EquiSet then broadcasts at time t each item li at a rate 

n(^) = J2sjeR{t) : li is alive for Sj at time t ^jA^)- 

Figured illustrates an execution of the algorithm, in which B-EquiSet chooses for each alive 
request Sj, to divide up the bandwidth allotted to Sj equally among every S^j's alive items. 



Requests 



{abc} 



The instance consists of three 
items A,B,C of length 1.5 and four 
requests 5*1 = {A^B^C} (in red), 
S2 = {A} (in green), ^3 = {B} 
(in blue), and 6*4 = {C} (in yel- 
low) with arrival times ai = 0, 
a2 = 1, as = 2, and — 3. 
Two schedules are presented: B- 
EquiSet with bandwidth s = 1.5 
(to the left) and an optimal sched- 
ule with unit bandwidth (to the 
right). Time flies downwards. Four 
lines to the right of each schedule 
represent each request's lifetime; 
the bandwidth allotted to each re- 
quest is outlined in their respec- 
tive color. B-EquiSet first al- 
lots all the bandwidth to 5*1 and 
splits it evenly among its items A, 
B and C (items A, B, and C get 
darker and darker as their broad- 
casts progress). At time 1, S2 ar- 
rives and B-EquiSet splits the 
bandwidth 

evenly between Si and 5*2, thus item A is broadcast at a rate 



B-EqiUlSET ALLCICATICIN 
WITH BANDWIDTH S = 1 .5 

FlowTime = 1 4.&7 



Optimal alldcaticin 
with bandwidth s = 1 .□ 
Flow Time = 1 1 




□.□ 



1.5 X (^ + - 



1 and its broadcast completes at 



time 2. At time 2, 5*3 arrives, and B-EquiSet splits the bandwidth evenly between Si, S2 and S3] Si has completed 
its download of A, thus B-EquiSet splits the bandwidth allotted to Si among B and C only; S2 was too late to 
download A, so it starts a new broadcast of A. Si, S2, S3, and S4 are finally served at time 3+|, 5 + |, 5 + | and 
6, for a total flow time B-EquiSeTi.5(§) = 14 + | whereas BOPTi = 11. 

Figure 1: An 1.5-speed execution of a B-EquiSet algorithm. 



Note that bandwidth adjustments for each item are necessary only when new requests arrive or 
when the broadcast of some item completes. 
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As in [7j, we deduce the performances of our broadcast algorithm B-EquiSet from the analysis 
of the performances of an other algorithm, EquioA, for a variant of the non-clairvoyant scheduling 
problem studied in [8J which includes dependencies. Section 0] presents this later problem and 
analyzes the competitiveness of algorithm EquioA. Then, Section deduces the competitiveness 
of B-EquiSet by simulating EquioA on a particular instance of non-clairvoyant scheduling built 
on the execution of B-EquiSet. 

4 Non- Clairvoyant Seq-Par Batch Scheduling 

For the sake of completeness we first sum up the results in [8J, reader may skip this paragraph 
in a first reading. Edmonds's non-clairvoyant scheduling problem consists in designing an online 
algorithm that schedules jobs on p processors without any knowledge of the progress of each job 
before its completion. An instance of non-clairvoyant job scheduling problem consists in a collection 
of jobs (J/e) with arrival times (a/e); each job Jk goes through a series of phases J^, . . . , J^^] the 
amount of work in each phase is at time t, the algorithm allocates to each uncompleted job 
Jk an amount p^. of processors (the {p\)s are arbitrary non-negative real numbers, such that at any 
time: p\ ^ P)'^ Qd^ch phase progresses at a rate given by a speed-up function T^^{pk) of the 
amount pk of processors allotted to Jk during phase J^, that is to say that the amount of work 
accomphshed between t and t -\- dt during phase is V^j^{p\)dt] let tj^ denote the completion time 
of the /-th phase of Jk^ i.e. t\ is the first time t' such that d-i V\{p\) dt = w^j^ (with = ak). 

k 

The overall goal is to minimize the flow time of the jobs, that is to say the sum of the processing 
time of each job, i.e. J-FlowTime = J2ki'^T^ ~ ^k)- We denote by iOVTs{d) the flow time of an 
optimal 5-speed schedule for J. The algorithm is non- clairvoyant in the sense that it does not know 
anything about the progress of each job and is only informed that a job is completed at the time 
of its completion. In particular, it is not aware of the different phases that the job goes through 
(neither of the amount of work nor of the speed-up function). One of the striking results of [8J is 
that in spite of this total lack of knowledge, the algorithm Equi that allocates an equal amount of 
processors to each uncompleted job is (2+e)-speed (2+4/e)-competitive when the speed up functions 
are arbitrary non-decreasing sub-linear functions (i.e., such that for all p < p\ r[{p) / p > rj^(p^)/p^, 
for all k, I). 

Two particular kinds of phases are of interest for our purposes: sequential and parallel. During 
a sequential phase, T(p) = 1, that is to say that the job progresses at a unit rate whatever amount 
of processing power it receives (even if it receives no processor at all, i.e. even if p = 0)! During a 
parallel phase, the job progresses proportionally to the processing power it receives, i.e. T(p) = p. 
Remark that these two kinds of speed-up functions match the requirement of Edmond's theorem 
and thus Equi is (2 + e)-speed (2 + 4/e)-competitive on instances consisting of a collection of jobs 
composed of sequential and parallel phases. 

As in [7J, we reduce the analysis of our broadcast algorithm B-EquiSet to the analysis of 
a non-clairvoyant scheduling algorithm. For that purpose, we need to introduce dependencies 
between the jobs in Edmonds's framework. We consider the following variant of the non-clairvoyant 
scheduling problem. 

Non- Clairvoyant Seq-Par Batches Scheduling. An instance of this variant consists in a col- 
lection !B = {Bi, . . . ,Bq} of batches Bj = {Jj,i^ • • • , Jj,uj} of jobs with arrival times ai, . . . , a^, where 
each job Jj^i is composed of two phases: a sequential phase of work followed by a parallel 

phase of work ■ ^ 0. (Note that this problem is different from the classical batch scheduling 
problem in which only one batch has to be treated.) The scheduler is non-clairvoyant and discovers 
each batch of jobs at the time of its arrival and is in particular not aware of the amounts of work 
of each job in each batch. The scheduler allocates to each job Jj^^, arrived and uncompleted at 
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time t, a certain amount pj ■ of the processors {pj ■ is an arbitrary non-negative real number). Let 
tj^i denote the completion time of job Jj^i ; tj^i is the first date verifying J^^'' s p^idt = w^-. We 

say that a batch is completed as soon as all its jobs are completed; let tj denote the completion 
time of batch Bj, tj — max^— 1^ tj^i- The goal is to minimize the flow titne of the batches, i.e. 
!B-FlowTime = J^B e'Bi^j ~ ^j)* denote by "BOPTgCB) the flow time of an optimal 5-speed 
schedule for !B. 

Similarly to the broadcast setting, we say that a request Bj (resp., a job Jj^i) is alive at time t 
if aj ^ tj (resp., aj ^ tj^i). 

EquioA Algorithms Family. Given a job scheduling algorithm A, we define the batches 
scheduling algorithm EquioA as follows. Let R{t) denote the set of batches that are alive at 
time t. EquioA allots to each batch alive at time t an equal amount of processors, i.e., p/\R{t)\] 
then, it runs algorithm A on each alive batch Bj to decide how to split the amount of processors 
alloted to Bj among its own alive jobs Jj^i. In the following, we only require algorithm A to be 
fully active, i.e., that it allots at all time all the amount of processors it is given to the alive jobs 
{i.e., never idles on purpose). Under this requirement, our results hold independently of the choice 
of A. Examples of fully active algorithms A are: A = Equi which equally sphts the amount of 
processors; or A = MinIdx which allots all the amount of processors to the smallest indexed alive 
job Jj^i in Bj, i.e. i = min{z' : Jj^if is alive at time t}. 

Analysis of EquioA. To analyze the competitiveness of EquioA, we associate to each batches 
scheduhng instance !B, two instances, and 3^^, of job scheduhng. We first bound the performances 
of our algorithm EquioA on !B from above by the performances of Equi on 3^ (Lemma We 
then use the "harder" job instance to show that the job instance was in fact "easier" than the 
batch instance !B if one increases slightly the number of processors (Lemmas El and [HI . Since Equi 
is competitive on 3^ we can then conclude on the competitiveness of EquioA on !B (Theorem 0. 

Consider a Seq-Par batches scheduling instance !B = {Bi, . . . , Bq} where each batch Bj = 
{Jj^i, . . . , Jj^uj} arrives at time Oj and each Jj^i in Bj consists of a sequential phase of work Wj- 
followed by a parallel phase of work ■. Consider the 5-speed schedule obtained by running 
algorithm EquioA on instance !B; let pj ■ denote the amount of processors allotted by EquioA 
to job Jj^i at time t, and p^j = i^Bj Pj i denote the amount of processors allotted to batch 
Bj at time t; let tj^i (resp., tj) be the completion time of job Jj^i (resp., batch Bj). We define 
a Seq-Par job scheduling instance 3' = {Ji^ • • • , Jq}^ where each job Jj arrives at time Oj, and is 
composed of a sequential phase of work Wj^ = maxj^. Wj-, followed by a parallel phase of work 

Wj^ ~ la^+w'^ Pj ^^'^ intuitively, Wj^ is the length of the longest sequential phase among the jobs 
in Bj and Wj^ is the total amount of parallel work in Bj to be scheduled by EquioA after the 
completion of the last sequential phase among the jobs in Bj. 

The key to the next lemma is that one gets exactly the same job schedule of the jobs in by 
running algorithm Equi on instance 3^ as by alloting at all time to each job Jj the same amount 
of processors as the jobs in Bj received from EquioA. 

Lemma 4 (Reduction to job scheduling) If A is fully active^ then EQUi5oA(!B) = 'Eqv1s{3^). 

Proof. As long as the longest sequential phase among the jobs in batch Bj is not completed, 
the batch Bj is ahve. By construction, job Jj is also ahve as long as this sequential phase is not 
completed. Since the amount of processors given to batch Bj in EquioA is given by Equi, and since 
Equi is non-clairvoyant, EquioA allots the same amount of processors to Bj as Equi allots to Jj 
until the completion of the longest sequential phase among the jobs in batch Bj. By construction, 
the longest sequential phase in batch Bj and the sequential phase of Jj end at the same time and at 
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this moment, ah the jobs ahve in Bj are in their parahel phase. Thus by construction, the over ah 
amount of remaining parahel work in Bj at that time is equal to the parallel work assigned to Jj. 
By construction, the amount of processors given to Jj equals the amount of processors alloted to 
batch Bj which is in turn equal to the total amount alloted to each of its remaining alive jobs since 
A is fully active. The overall remaining amount of parallel work is thus identical in Jj and Bj until 
they complete at the same time. Their flow times are thus identical in both schedules. We conclude 
the proof by reasoning inductively on the completion times (sorted in non-decreasing order) of each 
phase of each job in each batch. □ 

We now define the job instance = {Ji ^ - - - ^ Jq}- d" is a kind of worst case instance of 
the batch instance S, where all the parallel work in each batch Bj has to be scheduled after the 
longest sequential phase in Bj. Job J'- arrives at time aj and consists of a sequential phase of work 
w'j^ = m3.yij. .^Bj "i^j^i^ followed by a parallel phase of work Wj^ = ^j..^Bj 

Lemma 5 {d' is easier than d") JOPT,(aO ^ JOPT,(a'0. 

Proof. Since for all j, the sequential works of jobs Jj and J'- are identical and the parallel work 
in J'- is bounded from above by the parallel work in Jj', any schedule of ^" is vahd for ^' . □ 

Lemma 6 {'S" with 5 extra processors is "almost as easy" as S) For all 5 > 

JOPTi+5(a^0 ^ (1 + i/5)®0PTi(s). 



Proof. The proof consists in showing that when 5 extra processors are given, delaying the com- 
pletion of each batch Bj by a constant factor, (1 + 1/5), allows to postpone the schedule of all the 
parallel job phases in Bj after the completion of the last sequential phase in Bj^ which concludes 
the proof by construction of a''- 

Sort the batches of !B by non-increasing arrival time, i.e., assume ai ^ a2 ^ . . . ^ a^. Consider 
an optimal schedule !BOPTi of batches Bi,. . . ,Bq on one processor. We show by induction that 
there exists a schedule & of on 1 + 5 processors such that each job Jj completes before time 
tj + fj/S^ where tj and fj — tj — aj denote the completion time and the flow time of Bj in SOPT, 
respectively. We now show that the parallel phase of each job Jj can be scheduled between time 
tj and tj + this concludes the proof since, by construction, the sequential phase of Jj is 

necessarily completed before tj. Start with the first job J{^ Clearly, w^^ ^ /i. Thus, the total 
parallel phase of can be scheduled on the 5 extra processors between time ti and ti + fi/5. 
Assume now that the parallel phases of jobs . . . , Jj_i have been scheduled in & during the time 
intervals + . . . , + fj^i/S] respectively, and consider job Jj^ Since the jobs 

are considered in non-increasing arrival times, each job Jj^ whose parallel phase has been scheduled 
in & between tj and tj + fj/6 arrived in the time interval T = [aj^tj + fj/S] and furthermore 
tk ^ tj + fj/S. The total parallel work W of all the jobs currently scheduled in & during T, is 
then in fact scheduled completely in !BOPTi during T. Note that the parallel work of Jj was also 
scheduled in SOPTi during this time interval. Since SOPTi uses only one processor, we conclude 
that W + ^ tj + fj/5 - aj = (1 + l/5)/j. As one can schedule up to (1 + 5)fj/5 = (1 + l/5)fj 
parallel work between time tj and tj + fj/5 on 1 + 5 processors, the parallel work w'-^ of J'- can be 
scheduled in 6 on time. □ 

We can now conclude the analysis of EquioA. 

Theorem 7 (Competitiveness of EquioA) For alle > and 5 > 0, EquioA is a (2+e)(l+5)- 
speed (2 + 4/6)(l + 1/5) -competitive algorithm for the Non Clairvoyant Seq-Par Batches Scheduling 
problem. 
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Proof. We use the result of [8j on the competitiveness of Equi for the non-clairvoyant job schedul- 
ing problem to conclude the proof: EQUI(2+e)(l+(5)°A(!B) = EQUI(2+e)(l+(5)(30 ^ 

(LemmaEl (Theorem 1 in IH) 

(2 + 4/6) J0PT(i+5) (aO < (2 + 4/6) J0PT(i+5) {T) < (2 + 4/6) (1 + 1/5) BOPTi (:B) . □ 

(Lemma|5| (Lemma|6| 

5 Competitiveness of B-EquiSet 

Consider an instance of the online data broadcast problem with dependencies: a set S = {S^i, . . . , 5^} 
of q requests with arrival times ai, . . . , a^, over n items /i, . . . , of lengths ^i, . . . , Let £5 be the 
5-speed schedule designed by B-EquiSet on instance S, and B-EquiSeT5(S) be its flow time. 
Let Oi be a 1-speed optimal schedule of S, and BOPTi(S) be its flow time. 

Following the steps of we define an instance !B of non-clairvoyant seq-par batches scheduling 
from £5 and Oi, such that the performances of B-EquiSet on S can be compared to the perfor- 
mances of EquioA on !B for a particular fully-active algorithm A. More precisely, we construct !B 
such that 1) the flow time of EquioA on !B bounds from above the flow time of B-EquiSet on 
S and 2) the (batches) optimal flow time for !B is at most the (broadcast) optimal flow time for S if 
it is given extra resources. Since EquioA is competitive, we can then bound the performances of 
B-EquiSet with respect to the (batches) optimal flow time of !B which is by 2) bounded by the 
(broadcast) optimal flow time of S. 

The intuition behind the construction of !B is the following. A batch of all-new jobs is created 
for each newly arrived request, with one job per requested item. Each job J stays alive until its 
corresponding item / is served in £5. J is assigned at most two phases depending on the relative 
service times of / in £5 and Oi. The sequential phase of J lasts until either / is served in £5, 
or the broadcast of / starts in Oi. Intuitively, this means that it is useless to assign processors 
to J before the optimal schedule does. At the end of its sequential phase, if J is still alive, its 
parallel phase starts and lasts until the broadcast of / is completed in £5; the parallel work for J 
is thus defined as the total amount of bandwidth that its corresponding item / received within J's 
corresponding (broadcast) request in B-EquiSet. By construction, with a suitable choice of A, 
EquioA constructs the exact same schedule as B-EquiSet and claim 1) is verified. Concerning 
claim 2), the key is to consider the jobs corresponding to the broadcast requests for a given item / 
that are served by a given broadcast of / in Oi starting at some time t. The only jobs among them 
that will receive a parallel phase, are the one for which the broadcast of / in £5 starts just before or 
just after t. By construction, the total amount of parallel work assigned to these jobs corresponds 
to the bandwidth assigned to the two broadcasts of item / by £5 that start just before and just 
after time t, each of them being bounded by the length of /. The total amount of parallel work in 
the jobs for which the broadcast of the corresponding item / starts in Oi at some time t, is then 
bounded by twice the length of /, and can thus be scheduled during the broadcast of / in Oi if one 
doubles the number of processors, which proves claim 2). 

The following formalizes the reasoning exposed above. 

The Job Set Instance J. Recall the broadcast instance S, and the two broadcast schedules £5 
and Oi, defined at the beginning of this section, as well as the notations given in Section El In 
particular, let (7^, t) denote the completion time of the broadcast of item Ii that starts just after 
t in £5, and B^{Ii^t) be the time of the beginning of the first broadcast of item Ii that starts after 
t in Oi (see Section EJ. Recall the description of algorithm B-EquiSet in Section 01 at time t, 
let R{t) be the set of ahve requests; B-EquiSet sphts equally the bandwidth s among the ahve 
requests and for each alive request Sj^ it assigns an arbitrary rate rj^i{t) to each alive item Ii in 
Sj^ such that ^y^^^ -^^ ^. ^jA^) ^ B-EquiSet broadcasts then each item U at a rate 

n(^) = Y^j rj,i{t) at time t. 
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Given S, £5 and Oi, we define the non-clairvoyant batches scheduhng instance !B = {Si, . . . , Bq}, 
where each batch Bj is released at the same time as Sj^ i.e. at time aj, and contains one seq-par 
job Jj^i for each item li G Sj (note that the indices i of the jobs Jj^i in each batch Bj may not be 
consecutive depending on the content of Sj). Each job Jj^i consists of a sequential phase of work 
w^j^ — (min{Cf (/i, ttj), 5^(7^, ttj)} — ftj), followed by a parallel phase of work w^j^. If Cg{Ii,aj) ^ 

Bi{Ii,aj)^ then ■ = 0; otherwise, ■ = lB^li'^^ 'a^.)^ji(^)^^ ^ ^ where 77 is an infinitely small 
amount of work, i.e. if the download of item li in request Sj is completed in £5 after it starts in 
Oi, then the amount of parallel work assigned to Jj^i is just shghtly higher than the total amount of 
bandwidth allotted to item li within the bandwidth allotted to request Sj by B-EquiSet^ after 
the beginning of the corresponding broadcast in Oi. Adding an infinitely small amount of work 77 
to the parallel phase of Jj^i does not change the optimal batches schedule (except on a negligible 
(discrete) sets of dates) but since the algorithm EquioA is non-clairvoyant, this ensures that the job 
Jj^i remains alive until the broadcast of item Ij completes even if B-EquiSet^ deliberately chooses 
not to broadcast item li in the bandwidth allotted to request Sj (the introduction of infinitely small 
extra load can be rigorously formalized by adding an exponentially decreasing extra load 7/2^ to 
the kth requested job for a small enough 7). 

Lemma 8 There exists a fully-active algorithm A such that: B-EquiSeTs(S) < EQUl5oA(!B). 

Proof. The proof follows the hues of [7J. Given an amount of processors p for an ahve batch 
Bj, algorithm A assigns to each alive job Jj^i in Bj at time t the same amount of processors as 
B-EQU1SET5 would have assigned at time t to the corresponding ahve item 1^ of the corresponding 
alive request Sj which would have been assigned a bandwidth p. Since B-EquiSet^ allots all 
the bandwidth available to ahve jobs, A is fully-active. Now, since 77 is infinitely small, this extra 
load does not affect the allocation of processors computed by EQUI50A except over a negligible 
(discrete) set of dates. By immediate induction, each job Jj^i remains alive in the schedule computed 
by EQUI50A, as long as item 1^ is alive in batch Bj in £5. This is clear as long as Jj^i is in its 
sequential phase. Once Jj^i enters its parallel phase, as long as the broadcast of item li is not 
completed, either li is broadcast by B-EquiSet^ in batch Bj and Jj^i is scheduled by EQUI50A 
{A copies B-EquiSeTs), or B-EquiSet^ deliberately chooses not to broadcast the alive item li 
and since Jj^i has an infinitely small amount of extra work, Jj^i remains alive in EQUI50A as well. 
The flow time for each job Jj^i is then at least the flow time of the corresponding item li in £5; we 
conclude that each batch Bj completes in EQUI50A no earher than its corresponding request Sj in 
B-EquiSet^. □ 

Lemma 9 There exists a 2-speed batches schedule T2 such that: T2(!B) ^ B-FlowTime(Oi). 

Proof. Again, the proof follows the lines of [7j. Consider an item li. We partition the requests 
Sj containing item li into classes Ci,C2, . . one for each broadcast of li in Oi. The k-th class Ck 
contains all the requests Sj that download li in Oi during its kth broadcast, i.e. all requests Sj 
such that bi{Ii,k — 1) < aj < h^{Ii,k) (see Section El for notations). We show that for all fc, the 
total parallel phases of the jobs Jj^i such that Sj G Ck-, can be shoehorned into twice the area of 
bandwidth allotted by Oi to the fcth broadcast of item li. Since this holds for all i and all A:, we 
obtain a 2-speed schedule T2 such that T2(!B) ^ B-FlowTime(Oi). 

Let ti = bi{Ii,k) be the time of the beginning of the kth broadcast of li in Oi. Consider a 
request Sj in class Ck, clearly aj ^ ti. By construction, job Jij is assigned a non-zero parallel 
work only if Sj completes the download of li after ti in B-EquiSet^. Since Sj arrives before 
ti, it downloads li during one of the two broadcasts of li in B-EquiSet^ that start just before 



9 



or just after ti] let (resp. C^) be the set of requests served by the broadcast that starts just 

before ti (resp. just after ti). Let t2 and ts be the completion times of the broadcast of Ij in B- 

EQU1SET5 that start just before and just after ti respectively. By construction, the total amounts 

W~ and of parallel work assigned to the jobs Jj^i such that Sj G and are respectively: 

rt2 rt3 
W- = / rj^i{t) dt and = ^ / r^- ^(t) dt. Let us rewrite H^- + = i?i + 

With i?i = //^^ : ^, ec, < ill n{t) dt and i?2 = : s,ec^ ^jAt) dt < //^ n{t) dt, i?i 

and i?2 cire thus at most the total area alloted to item li by B-EquiSet^ during the broadcasts of 
li that start just before and just after ti; since a broadcast is completed as soon as the rates sum up 
to the length of the items, Ri ^ ii and R2 ^ 4, and thus W~ + ^ 2^. Since Oi allots a total 
bandwidth of ii to broadcast item li after time ti, and since the parallel works of the jobs Jj^i such 
that Sj G Ck are released at time ti and sum up to a total W~ + < 2^, one can construct on 
2 processors, a 2-speed schedule T2 in which the parallel phases of each of these jobs Jj^i completes 
before the fcth broadcast of Ii completes in Oi. 

Since no processor needs to be allotted to the sequential phases, repeating the construction 
for each item Ii yields a vahd 2-speed schedule T2 in which each job Jij completes before the 
corresponding request Sj completes the download of Ii in Oi. It follows that each batch Bj is 
completed in T2 before its corresponding request Sj is served by Oi. □ 

We now conclude with the proof of the main theorem. 



Proof of Theorem ^ Setting s = (4 + e)(l + 5), the competitiveness of EquioA (The- 
orem Cj) concludes the result: B-EQUlSET(4+e)(i+5)(S) < EQUl(4+e)(i+5)oA(!B) < 

(Lemma|8| (Theorem 

(2 + 8/e)(l + l/5)!BOPT2(S) ^ (2 + 8/e)(l + l/5)T2(!B) ^ (2 + 8/e)(l + 1/5) BOPTi(S). □ 

(Lemma|9| 

The B-EquiSet-Edf algorithm. We apply the same method as in 0. Let s = (4 + e)(l + 5)^ 
and c = (2 + 8/e)(l + 15)^. B-EquiSet-Edf simulates the 5/(1 + 5)-speed execution of 
B-EquiSet and at each time t such that the broadcast of an item Ii in B-EquiSet is completed, 
it releases an item P- of length ii with a deadhne t + {t — t') / 5 where t' is the time of the beginning 
of the considered broadcast of Ii in B-EquiSet. Then, B-EquiSet-Edf schedules on a 
bandwidth s each item I[ according the earliest-deadline-first policy. With an argument similar 
to Lemma ini or [7J, one can show that a feasible schedule of the items I[ exists and thus that 
earhest-deadline-first constructs it which ensures that B-EquiSet-Edf is 5-speed c-competitive. 
Since earliest-deadline-first preempts the broadcast of an item only when a new item arrives, 
B-EquiSet-Edf preempts each broadcast at most once on average. Note that one can avoid 
long idle period in B-EquiSet-Edf's schedule by broadcasting an arbitrary item Ii alive in 
B-EquiSet at time t if no item I[ is currently alive. 

Concluding remarks. Several directions are possible to extend this work. First, B-EquiSet 
does not have precise policy to decide in which order one should broadcast the items within each 
requested set; deciding on a particular policy may lead to better performances (bandwidth and/or 
competitive ratio). Second, it might be interesting to design a longest-wait-first greedy algorithm in 
presence of dependencies; B-EquiSet shows that the items should not simply receive bandwidth 
according to the number of outstanding requested sets for this item (the allotted bandwidth depends 
also on the number of outstanding items within each outstanding set), it is thus a challenging 
question to design proper weights to aggregate the current waits of the requested sets including a 
given item. 
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A Omitted proof 



Proof of Fact [31 We use Yao's principle (see QSlEl) to extend the result to randomized algo- 
rithms. We consider the following probabilistic distribution of requests set over n items: 1 + ^/n 
requests arrive at time t = 0; one request asks for an uniform random subset of size n — yjn 
of the n items; and each of the requests ^i, . . . , asks for one random distinct item among the 
^Jn remaining items. Consider again any deterministic algorithm A with bandwidth s. Since A is 
deterministic and ignores the dependencies, the schedule designed by A schedule is independent of 
the random instance. At time i = n/{2s)^ the broadcast of at least n/2 items is not completed. 
Thus, the probability that request Sj^ for j ^ 1, asks for one of these items is at least 1/2. Then, 
the expected number of unsatisfied request at time t = n/{2s) is at least \fnl2. We conclude that 
the expected flow time for any deterministic algorithm with bandwidth s under this distribution 
of request is at least n^Jn|{^s). According to Yao's principle, the worst expected flow time of any 
randomized algorithm over the collection of all the considered instances is at least n^/nj{\s). But 
BOPTi ~ |n, which concludes that no randomized algorithm is 5-speed c-competitive, for all s and 
c<Vri/(65). □ 
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